0%

(NIPS 2018 Oral) Generalisation of structural knowledge in the Hippocampal-Entorhinal system

Keyword [Hippocampal] [Cortex] [Hebbian Memory] [Hippocampal-Wntorhinal System]

Whittington J, Muller T, Mark S, et al. Generalisation of structural knowledge in the hippocampal-entorhinal system[C]//Advances in Neural Information Processing Systems. 2018: 8484-8495.



1. Overview


1.1. Motivation

  • a central problem to understanding intelligence is the concept of generalisation
  • hippocampal-entorhinal system is known to be important for generalisation

In this paper, it proposes that to generalise structural knowledge, the representations of the structure of the world (how entities in the word relate to each other) need to be separated from representations of the entities themselves

  • ANN embedded with hierarchy and fast Hebbian memory
  • representations can effectively utilise memories
  • shows a preserved relationship between entorhinal gird and hippocampal place cells across environments
  • explicitly represented structure can be combined with sensory information in a conjunctive code unique to each environment. Thus sensory observations are fit with prior learned structural knowledge, leading to generalisation

1.2. Contribution

1.2.1. Neuroscience

  • find an interpretation of grid cells, place cells and remapping that offers a mechanistic understanding for the hippocampal involvement in generalisation of knowledge across domains
  • results suggest spatial representations found in the brain may be an instance of a more general coding mechanism organising knowledge across multiple domains

1.2.2. Machine Learning

  • build a network where fast Hebbian learning interacts with slow statistical learning
    • this allow to learn representations whereby memories are not only stored in a Hebbian network for one-shot retrieval within domain, but also benefit from statistical knowledge that is shared across domains - allowing zero shot inference

1.3. Generally



  • implement its proposal in an ANN tasked with predicting sensory observations when walking on 2D graph worlds, where each vertex is associated with a sensory experience
  • To make accurate predictions, the agent should learn the underlying hidden strucutre of graphs
  • unsupervised learning. providing the network with only sensory observations and actions
  • place cells form a conjunctive representation between sensory identity and strucuture. This conjunctive representation forms a Hebbian memory, which bridges structure and identity, allowing the same structural code to be reused across environments
  • combine fast Hebbian learning of episodic memories, with gradient descent which slowly learns to extract statistics of these memories

1.4. Details

  • propose that the statistics of memories in hippocampus are extracted by cortex
  • propose that future hippocampal representations/memories are constrained to be consistent with the learned structural knowledge
  • choose memory storage and addressing to be computationally biologically plausible (rather than using other types of differentiable memory more akin to RAM) , as well as using hierarchical processing. This enables our model to discover representations that are useful for both navigation and addressing memories

1.5. In Neuroscience

  • generalisation of statistical structure (the relationships between objects in the world) imbues an agent with the ability to fit things/concepts together that share the same statistical structure, but differ in the particularities
  • hippocampus is known to be important for generalisation, memory, problems of causality, inferential reasoning, transitive reasoning, conceptual knowledge representation, one-shot imagination and navigation
  • in spatial navigation there is a good undertanding of neuronal representations in both hippocampus (place cell, landmark cells) and medial entorhinal cortex (grid cell, border cell, object vector cell)
  • place cells and grid cells have had a radical impact in neuroscience, leading to the 2014 Noble Prize in Physiology and Medicine
  • place and grid cells are similar in that they have a stable firing pattern for specific regions of space
  • place cells only fire in a single (or couple) localtion in a given environment whereas grid cells fire in a regular lattice pattern tiling the space. These cells cemented the idea of a ‘cognitive map‘, where an animal holds an internal representation of the space it navigates
  • other entorhinal cell types (border, object vector cells) appear to have disparate roles in coding space
  • remapping (traditionally thought to be random) → the space cell code is different for two structurally identical environments



2. Model




  • consider an agent passively moving on a 2D graph, observing a non-unique sensory stimulus (an image) on each vertex
  • if the agent wishes to undertand its environment then it should maximise its model’s probability of observing each stimulus
  • trained on many environments sharing the same strucure (2D graph)
  • one approach to this problem: have an abstract representation of space encoding relative locations, and then place a memory of what stimulus was observed at that (relative) location
  • since the agent understands where it is in space, this allows for accurate state predictions to previously visited nodes even if the agent has never travelled along that particular edge before (Figure 2c)
  • grid cell as base for constructing abstract representation of space
  • place cell representations for the formation of fast episodic memories
  • posit that this (place cells forms a conjunction) is done hierarchically across spatial frequencies, such that the higher frequency statistics can be repeatedly used across space.This reduces the number of weights that need to be learnt
  • grid cells to be recurrent through time
  • view the hippocampal-entorhinal system as one that performs inference

2.1. Model Summary

  • the model is a neural network and learns strucuture across tasks
  • optimise end-to-end via backpropagation through time
  • the central (attractor) network employs Hebbian learning to rapidly remember the conjunction
  • a generavie temporal model learns how to use the Hebbian memory most efficiently given the common statistics of transitions across worlds

2.2. Notation



  • a layer of activations with vector notations


  • Element
  • s. index for sensory
  • j. index for phases


2.3. Generative Model



  • g. grid cell
  • p. place cell
  • M. agent’s memory
  • a. action
  • Θ. parameters of generative model
  • x. one-hot vector where each of its n_s elements represent a sensory identity
  • g&p (learned instead of hard-coded). come in different frequencies (hierarchies) indexed by superscript f

2.3.1. Grid Cells

  • to predict where we will be, we can transition from our current location based on our heading (path integration, Fig 2c)



  • f. functions specific to the distribution in question

  • connections in D_a are from low frequency to the same or higher frequency only (or alternatively only within frequency).
  • separate into hierarchical scales so that high frequency statistics can be reused across lower frequency statistics, i.e. learning and knowledge is reused across space


2.3.2. Place Cells



  • for retrieving memories
  • stored memories are extracted via an attractor network (Fig 2b) using as input - i.e. grid cells act as an index for memory extraction



2.3.3. Data



  1. categorical distribution


  • sum over phases
  • f_c*. MLP
  • choose f* to be 0 (only include highest frequency)

2.4. Inference Network



  • posterior is intractable so approximated to



  • phi. parameters of the inference network

  • learn Θ and phi by maximising the ELBO with the VAE fra

2.4.1. Place Cells



2.4.2. Grid Cells



2.5. Hebbian Memories

  • when enter a new environment, memory is reset to be empty (zeros)
  • memories of place cell representations are stored in Hebbian weights between place cells (M_t)
  • allow rapid learning when entering a new environment


  • p^. place cells generated from inferred grid cells
  • λ&η. the rate of forgetting and remembering
  • connections from high to low frequencies are set to zero, so that memories are retrieved hierarchically


  • best results. when two separate matrices were used



  • x(^). retrieved memory with the sensorium as input to the attra

2.5.1. Retrieval

  • attractor network



  • τ. iteration of the network

  • α. decay term
  • h_0. input, from grid cells or sensorium (depending on for generative or inference), dimensions scaled appropriately
  • output. retrieved memory (place cell code)

2.6. Model Implication

  • believe that using more biologically realistic computational mechanisms (Hebbian Memory instead of LSTM) will facilitate further incorporation of neuroscience-inspired phenomena, such as successor representations or replay

2.7. Details

  • although presented a Bayesian formulation, best resuts were obtained by only using the means of the above distributions


  • first item. cross entropy loss
  • other item. squared error loss between inferred and generated variables
  • 5 different frequencies. n_f as [10, 10, 8, 6, 6]
  • environment square. [8, 10, 12]
  • agent changes to a new environment after 2000~5000 steps
  • a_t. up, down, left, right, stay still
  • time truncated to 25 steps
  • two separate memory matrices. use additional memory module in grid cell inference
  • Typically after 200-300 environments, the agent has fully learned the strucuture (~50000 gradient updates)
  • remove a_t from the generative model so that the generative model can more easily capture the true underlying transition statistics
  • place-like representations are learned in the attractor
  • grid-like representations are learned in the generative temporal model

2.8. graphic



For the gird cells of a certain frequency (Inference)

  1. (1, 30) downsample to (1, 10)
  2. combined with (1, 10) sensory cell to get (1, 100) cells
  • (Constrain) connections from high to low frequency are set to 0



3. Experiments




For Fig 4b middle and right

  • grid representations that are shifted versions of each other, as in the brain
  • the separation into different phases (same frequency) means that two conjunctive place cells that respond to the same stimulus, will not necessarily be active simultaneously - each cell will only be active when their corresponding grid phase is active
  • thus one can uniquely code for the same stimulus in many different locations
  • Across two environments, a given stimulus may occur at the same grid phase but at a different location